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Abstract 

The article describes a model of chess based on information theory. A 
mathematical model of the partial depth scheme is outlined and a formula 
for the partial depth added for each ply is calculated from the principles 
of the model. An implementation of alpha-beta with partial depth is 
given . The method is tested using an experimental strategy having as 
objective to show the effect of allocation of a higher amount of search 
resources on areas of the search tree with higher information. The search 
proceeds in the direction of lines with higher information gain. The effects 
on search performance of allocating higher search resources on lines with 
higher information gain are tested experimentaly and conclusive results 
are obtained. In order to isolate the effects of the partial depth scheme 
no other heuristic is used. 



1 Introduction 



1.1 Motivation 

There is gap in the scientific analysis of the fraction ply methods one of the 
best methods of search in computer chess and other strategy games. As Hans 
Berliner pointed out about the scheme of "partial depths", "...the success of 
these micros (micro-processor based programs) attests to the efficacy of the 
procedure. Unfortunately, little has been published on this" . This research has 
the objective of developing a theoretical model of the partial depth scheme based 
on information theory, implementing it and providing experimental evidence for 
the method and for the model. 

1.2 The research methodology and scenario 

An introduction to games theory and information theory is given in the back- 
ground section. A model based on the principles of information theory is out- 
lined and then the formula for partial depths scheme is calculated. Search ex- 
periments are performed and then the results are interpreted. In the appendix 
can be found an introduction to some concepts in chess, and to the axioms of 
information theory. 

1.3 Background knowledge 

1.3.1 The games theory model of chess 

An important mathematical branch for modeling chess is games theory, the 
study of strategic interactions. 

Definition 1 Assuming the game is described by a tree, a finite game is a game 
with a finite number of nodes in its game tree. 

It has been proven that chess is a finite game. The rule of draw at three 
repetitions and the 50 moves rule ensures that chess is a finite game. 

Definition 2 Sequential games are games where players have some knowledge 
about earlier actions. 

Definition 3 A game is of perfect information if all players know the moves 

previously made by all players. 

Zermelo proved that in chess cither player i has a winning pure strategy, 
player ii has a winning pure strategy, or either player can force a draw. 

Definition 4 A zero sum game is a game where what one player looses the 
other wins. 
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Chess is a two-player, zero-sum, perfect information game, a classical model 
of many strategic interactions. 

By convention, W is the white player in chess because it moves first while 
B is the black player because it moves second. Let M{x) be the set of moves 
possible after the path x in the game has been undertaken. W choses his first 
move wi in the set M of moves available. B chooses his move 61 in the set 
M{wi): bi € M(wi) Then W chooses his second move W2, in the set M{wi,bi): 
W2 G M(wi,6i) Then B chooses his his second move 62 m the set M{wi,bi,W2)'- 
62 e M(wi, 61,^2) At the end, W chooses his last move Wn in the set M{wi, 61, 
••• ,w„_i ,6„_i ). 

In consequence Wn S M(w;i, hi, ... ,Wn-i ,bn-i ) 

Let n be a finite integer and M, M{wi), M('Wi,bi),..., 
M(wi, hi, ... ,w„-i ,6„_i,w;„) be any successively defined sets for the moves 
Wi,bi,...,Wn,bn satisfying the relations: 

bn € M{wi,bi,...,Wn-l,bn-l,Wn) (1) 

and 

Wn e M{'Wi, bi, ... 

) (2) 



Definition 5 A realization of the game is any 2n-tuple (wi, bi, ... ,Wn-i 
,bn-i,Wn,bn ) Satisfying the relations (1) and (2) 

A realization is called variation in the game of chess. 

Let R be the set of realizations (variations) . of the chess game. Consider 
a partition of R. in three sets Ryj and Rwh 

so that for any realization in 
Rw, player 1 ( white in chess ) wins the game, for any realization in Ri, , player2 
(black in chess) wins the game and for any realization in Rwbi there is no winner 
(it is a draw in chess). 

Then R can be partitioned in 3 subsets so that 

R = Ryj + Rb + Rwh (3) 



W has a winning strategy if 3 wi € M , V 6i S M(w\) , 
3w2e M{wi,bi) , V 62 G M{wl, 61, w2 ) ... 

3 Wn G M{wi,bi,...,Wn-l,bn-l), 

y bn & M{wi,bi,...,Wn-i,bn-i,Wn) , where the variation 

Oi , . . . , Wn , On )€Rw (4) 

W has a non-loosing strategy if 3 wi e M , V 61 e M{wi) , 
3w2 e M{wi,bi) , V 62 e M(u;i, 61, W2 )... 
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3 W„ € M{bi,Wi,...,Wn-l,bn-l), 

y bn € M(wi,bi,...,'Wn-i,bn-i,Wn) , where the variation 

, Wn , bn ) eRu, + Rwb (5) 

B has a winning strategy if 3 61 e M , V Wi e M(wi) , 
3 62 e M{wi,bi,W2 ) ,y W2 & M{wi, bi) ... 

3 bn e M{wi,bi, — ,Wn-l,bn-l,Wn), 

y Wn & M{wi,bi,...,Wn-i,bn-i) , where the variation 

) e Rb (6) 

B has a non- loosing strategy if 3 61 G M , V G M{wi) , 
3w2 e M{wi,bi) ,y W2 e M{wi, 61) ... 

3 bn € M{wi,bi,-;Wn-l,bn-l,Wn), 

V Wn G M(i/;i,6i,...,i/;„_i,6„-i) , where the variation 
{wi,bi,... 

Theorem 1 Considering a game obeying the conditions stated above, then each 
of the next three statements are true: 

(i) . W has a winning strategy or B has a non-losing strategy. 

(ii) . B has a winning strategy or W has a non-losing strategy. 

(Hi). If Rwb = 0) then W has a winning strategy or B has a winning strategy. 



If Rwb is 0, one of the players will win and if R^b is identical with R the 
outcome of the game will result in a draw at perfect play from both sides. It is 
not know yet the outcome of the game of chess at perfect play. 

The previous theorem proves the existence of winning and non-losing strate- 
gies, but gives no method to find these strategies. A method would be to 
transform the game model into a computational problem and solve it by com- 
putational means. Because the state space of the problem is very big, the players 
will not have in general, full control over the game and often will not know pre- 
cisely the outcome of the strategies chosen. The amount of information gained 
in the search over the state space will be the information used to take the deci- 
sion. The quality of the decision must be a function of the information gained 
as it is the case in economics and as it is expected from intuition. 

1.3.2 Concepts in information theory 

Of critical importance in the model described is the information theory. It is 
proper to make a short outline of information theory concepts used in the infor- 
mation theoretic model of strategy games and in particular chess and computer 
chess. 
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Definition 6 A discrete random variable x is completely defined by the finite 
set of values it can take S, and the probability distribution Px{x)^^g. The value 
Px{x) is the probability that the random variable x takes the value x. 

Definition 7 The probability distribution .'5'— > [0,1] is a non-negative func- 
tion that satisfies the normalization condition 

^P.(x) = l (8) 

Definition 8 The expected value of f(x) may be defined as 

Y,P,{x)*f{x) (9) 

x&S 

This definition of entropy may be seen as a consequence of the axioms of 
information theory It may also be defined independently |33]. As a place 
in science and in engineering, entropy has a very important role. Entropy is 
a fundamental concept of the mathematical theory of communication, of the 
foundations of thermodynamics, of quantum physics and quantum computing. 

Definition 9 The entropy of a discrete random variable x with probability 
distribution p(x) may be defined as 

Hx^-Y,p{x)^\ogp{x) (10) 

x£S 

Entropy is a relatively new concept, yet it is already used as the founda- 
tion for many scientific fields. This article creates the foundation for the use 
of information in computer chess and in computer strategy games in general. 
However the concept of entropy must be fundamental to any search process 
where decisions are taken. 

Some of the properties of entropy used to measure the information content 
in many systems are the following: 

Non- negativity of entropy 
Proposition 1 

Hx > (11) 

Interpretation 1 Uncertainty is always equal or greater than O.If the entropy, 
H is 0, the uncertainty is and the random variable x takes a certain value with 
probability P{x) = 1 

Proposition 2 Consider all probability distributions on a set S with m ele- 
ments. H is maximum if all events x have the same probability, p{x) = ^ 

Proposition 3 If X and Y are two independent random variables , then 

Px,Y{x,v)^PAx)*Py{y) (12) 
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Proposition 4 The entropy of a pair of variable X and Y is 

Hx,y — + Hy (13) 

Proposition 5 For a pair of random variables one has in general 

Hx, y<H,+Hy (14) 

Proposition 6 Additivity of composite events 

The average information associated with the choice of an event x is additive, 
being the sum of the information associated to the choice of subset and the 
information associated with the choice of the event inside the subset, weighted 
by the probability of the subset 

Definition 10 The entropy rate of a sequence xn = Xt , t ^ N 

h,^ lim ^ (15) 

Definition 11 Mutual information is a way to measure the correlation of two 
variables 

ix.Y = - E p(^,y)*iog^^^^ (16) 

All the equations and definitions presented have a very important role in the 
model proposed as will be seen later in the article. 

Proposition 7 

Ix,y > (17) 

Proposition 8 

Ix,Y = (18) 
if any only if X and Y are independent variables. 

1.4 Previous research in the field 

A necessary condition for a truly selective search given by Hans Berliner is the 



following : The search follows the areas with highest information in the tree 29 
"It must be able to focus the search on the place where the greatest information 
can be gained toward terminating the search" . Berliner describes the essential 
role played by information in chess, however he does not formalize the concept of 
information in chess as an information theoretic concept. From the perspective 



of the depth in understanding the decision process in chess the article 29 
is exceptional but it docs not formulate his insight in a mathematical frame. 
It contains great chess and computer chess analysis but it does not define the 
method in mathematical definitions, concepts and equations. 
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Mark Winands in 45 outlines a method based on fractional depth where 
the fractional ply FP of a move with a category c is given by 



FP^'^ (19) 

His approach is experimental and based on data mining as the method pre- 
sented previously. 



In the article 46 David Levy, David Broughton, Mark Taylor describe 
the selective extension algorithm. The method is based on "assigning an 

appropriate additive measure for the interestingness of the terminal node" of a 

path. 

Consider a path in a search tree consisting of the moves Mi, Mjj, Mijk and 
the resulting position being a terminal node. The probability that a terminal 
node in that path is in the principal continuation is 

P{M,) * P{M,j) * P(M„fc) (20) 
The measure of the "interestingness" of a node in this method is 

lg[P{Mi)] + lg[P{M,,] + lg[P{M,,k)] (21) 

1.5 analysis of the problem 

The problem is to describe the mathematical meaning of information in com- 
puter chess, develop the principles and formulas that can be used to control the 
search and provide experimental evidence for the search heuristic as well as for 
the role of information gain in obtaining good results at an acceptable cost. 

1.6 Contributions 

The contributions of this research are the creation of the information theoretical 
model for search in computer chess, the description of the information gain in 
computer chess and a scientific explanation of the partial depth scheme. The 
paths explored are the areas of the search tree with the highest amount of 
information gain. Other contributions are, the calculation of information gain 
for important moves, the calculation of a formula describing the size of the 
ply added for various moves, the experimental evidence given for the effect of 
information gain on search for chess problems. 



2 Search and decision methods in computer chess 

Search on informed game trees 



In 35 it is introduced the use of heuristic information in the sense of upper 
and lower bound but no reference to any information theoretic concept is given. 
Actually the information theoretic model would consider a distribution not only 
an interval as in |35 . Wim Pijls and Arie de Bruin presented a interpretation 
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of heuristic information based on lower and upper estimates for a node and 
integrated it in alpha beta, proving in the same time the correctness of the 
method under the following specifications. 

Consider the specifications of the procedure alpha-beta. If the input param- 
eters are the following: 

(1) n, a node in the game tree, 

(2) alpha and beta , two real numbers and 

(3) f , a real number, the output parameter, 
and the conditions: 

(1) pre: alpha < beta 

(2) post: 

alpha < f < beta =;> / = f{n), 
f < alpha =4> f(n) < f < alpha 
f > beta =^ f(n) > f > beta 



then 



Theorem 2 The procedure alpha-beta (defined with heuristic information, but 
not quantified as in information theory) meets the specification. ^35^ 



Considering the representation given by [35| , assume for some game trees, 
heuristic information on the minimax value f(n) is available for any node. 

Definition 12 The information may be represented as a pair H ~ (U,L), where 
U and L map nodes of the tree into real numbers. 

Definition 13 U is a heuristic function representing the upper bound on the 
node. 

Definition 14 L is a heuristic function representing the lower bound on the 
node. 

For every internal node, n the condition U(n) > f(n) > L(n) must be satisfied. 

For any terminal node n the condition U(n) — f(n) — L(n) must be satisfied. 
This may even be considered as a condition for a leaf. 

Definition 15 A heuristic pair H — (U,L) is consistent if 
U(c) < U(n) for every child c of a given max node n and 
L(c) > L(n) for every child c of a given min node n 



The following theorem published and proven by 35 relates the information 
of alpha-beta and the set of nodes visited. 

Theorem 3 Let Hi — (Ui,Li) and H2 ~ (U2,L2) denote heuristic pairs on a 
tree G, such that Ui(n) < U2(n) and Li(n) > L2(n) for any node n. Let Si and 
5*2 denote the set of nodes, that are visited during execution of the alpha-beta 
procedure on G with Hi and H2 respectively, then Si C ^2 . 
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2.1 Problem formulation 



In the light of the new description it is possible to reformulate the search problem 
in a strategy game. The problem is to plan the search process minimizing the 
entropy on the value of the starting position considering limits in costs. The 
best case is when entropy, or uncertainty in the value of a position becomes 
with an acceptable cost in search. This is feasible in chess and it happens 
every time when a successful combination is executed and results in mate or 
significant advantage. 

It is possible to formulate the problem of search in computer chess and in 
other games as a problem of entropy minimization. 

oo 

Min{H {position)} = Min{- ^ PilogPi) (22) 

i=l 

subject to a limit in the number of position that can be explored. 

3 The model 

Assumption 1 The entropy of a position can be approximated by the sum of 
entropy rates of the pieces minus the entropy reduction due the strategical con- 
figurations. 

This can be expressed as: 

N 

Htrajectory {position) = ^ Hp. - ^ if^. (23) 
1=1 i 

where Hi represents the entropy of a piece and Hg- represents the entropy 
of a structure with possible strategic importance. 

This gives also a more general perspective on the meaning of a game piece. 
A game piece can be seen as a stochastic function having the state of the board 
as entrance and generating possible trajectories and the associated probabilities. 
These probabilities form a distribution having an uncertainty associated. 

The entropy of a positional pattern, strategic or tactic may be considered a 
form of joint entropy of the set of variables represented by pieces positions and 
their trajectory. The pieces forming a strategic or tactic pattern have correlated 
trajectories which may be considered as forming a plan. 

H{si) = -Y^...J^P{s,)log[P{s^)] (24) 

Hs, = H{si) (25) 

where Sj is a subset of pieces involved in a strategic pattern and the prob- 
abilities Pi represent the probability of realization of such strategic or tactical 
pattern. The reduction of entropy caused by strategic and tactical patterns such 
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as double attacks, pins, is determined by both the frequency of such structures 
and by a significant increase in the probabihty that one of the sides wiU win 
after this position is reaUzed. 

We may consider the pieces undertaking a common plan as a form of cor- 
related subsystems with mutual information I(piecel,piece2,...). It results that 
undertaking a plan may result in a decrease in entropy and a decrease in the need 
to calculate each variation. It is known from practice that planning decreases 
the need to calculate each variation and this gives an experimental indication 
for the practical importance of the concept of entropy as it is defined here in 
the context of chess . Each of the tactical procedures , pinning, forks, double 
attack, discovered attack and so on, can be understood formally in this way. A 
big reduction in the uncertainty in regard to the outcome of the game occurs, 
as the odds arc often that such a structure will result in a decisive gain for a 
player. When such a structure appears as a choice it is likely that a rational 
player will chose it with high probability. 

The entropy of these structures may be calculated with a data mining ap- 
proach to determine how likely they appear in games. 

An approximation if we do not consider the strategic structures would be: 

Assumption 2 

N 

HtrajectoryiPOSition) = ^Hp. (26) 
i=l 

assumption analysis: The entropy of the position is smaller in general than 
the sum of the entropies of pieces because there are certain positional patterns 

such as openings, end-games, various pawn configurations in a chess position 
which result in a smaller number of combinations, results in order and a smaller 
entropy. Closer to reality would be such a statement: 

N 

Htrajectory (position) <'^Hp^ (27) 
j=l 

4 RESULTS 

4.1 A definition of information gain in computer chess 

It is possible to define the information gain during the search process based on 
the reduction in uncertainty in the following way: 

Igain = (28) 

Where H represents the uncertainty in the value of the position and AH 

AH = H2- Hi (29) 
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represents the variation of uncertainty in the current position after a move 
is made. It is the information gained after making a move. 
In the case when 

AH <0 (30) 

we speak of information gain, 
if 

AH >0 (31) 

we understand information lost through approximate evahiation or other oper- 
ation. 

It is possible to describe the information gain of the search process by defin- 
ing the heuristic efficiency 

rrp ^ Igain _ AH 

ANodes ANodes ^ ' 

When ANodes — > 1 the information gain results after a move is 

Igatn{Move) = H{beforeMove) - H [after Move) (33) 

This concept may be considered similar to the the concept of information 
gain for decision trees, the Kullback-Leibler divergence. We may see the same 
principle also here, the higher the difference between entropies, the higher the 
information gain, which makes very much sense also intuitively and it provides 
a new theoretical justification for the empirical heuristics of chess and computer 
chess. 



4.2 The justification of the partial depths scheme using 
the information theoretic model 

The partial depths method is a generalization of the classic alpha beta in that it 
offers a greater importance to moves considered significant for the search. If all 
moves have the same importance then , the partial depth scheme can be reduced 
to the ordinary alpha-beta scheme. It can be described also as an importance 
sampling search. The partial depth scheme has been used by various authors. 



As Hans Berliner observed, few has been published about this method 29 .The 
contribution of this article goes in this direction. 

It is possible to define a function returning the depth: 



Adepth = /(path) (34) 

This is a generalization of the classic alpha-beta because in classic alpha- 
beta A depth = constant; If the decision to add a certain depth to the path is 
dependent only on the current move and position , then if A path — > 1 the 
decision depends only on the current position. 

The increase in depth is dependent on the path in this method, where the 

path is composed of moves mi ,m2 , m^, , In the classic alpha-beta the 

depth increase is constant regardless of the type of move. 
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4.3 The design of a search method with optimal cost for 
a certain information gain 

The principle behind a theory of optimal search should be the allocation of 
search resources based on the optimality of information gain per cost. It results 
that the fraction of a search ply added to the depth of the path with a move 
should be in inverse proportion to the quality of the move. The standard ap- 
proach gives equal importance to all moves, the fraction ply method gives more 
importance to significant moves. Therefore it must be described a quantitative 
measure for the quality of a move. The reduction from the normal depth of 1 
ply should be proportional to the quantitative measure of the quality of a move. 

The fraction ply FD must decrease with the quality of the move relative to 
optimal. The fraction ply added would be equal in this system to the decrease 
of a full play with the approximate entropy reduction achieved by that move 
compared to a move having the highest entropy reduction. 

For instance for a capture of a rock the entropy reduction is log 14 

Axiom 1 An axiom of efficient seareh in ehess , in computer chess and of 
efficient search in general should be that the probability of executing a move must 
be equal to the heuristic effciency of that move which is equal to the information 
efficiency of expanding the node resulted after the move. The same principle can 
be considered in general for trajectories. 

By notation, let the heuristic efficiency be HE and Pc^ be the probability 
of a move in category c, to be executed. The heuristic efficiency is a funda- 
mental measure of the ability of a search procedure to gain information from 
the state space. The heuristic efficiency depends in this analysis on the cat- 
egories of moves and trajectories defined. The examples are for moves with 
individual tactical values, however the analysis can be extended also to tactical 
plans generated by pins, forks and other tactical patterns. Because such anal- 
ysis would require some readers to look for the meaning of these structures in 
chess books and also because space considerations the moves generating such 
configurations would not be presented as examples. No additional theoretical 
difficulties would emerge from the introduction of these move categories. The 
same applies to strategical elements. 

Following the principles outlined, a formula for the fraction ply can be de- 
rived. 



Pc,=k* HE (35) 

Considering that 

HE = (36) 

ACost ^ ' 

and 

^gairii AEntVOpycategoryi t^7\ 

ACost ~ ACost ^ ' 
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it means 



For k = 1, 



from this, 



ACost ^ ' 



~ ACost ^"^^^ 



Of course a diiferent value than 1 can be given to the constant k and this 
will propagate without changing the meaning of the equations. The constant k 
would increase the flexibility of implementations actually, offering more freedom 
in this direction. Now consider the same equation for the move category with 
the best information gain. 

It means 

AEn tropDBes tGa 

ACost 



p _ t-^J-Jin-i upyjjestGain / . ^ ^ 



Assuming tlie moves from the best category, the most informational efficient 
will always be executed in the search, the following condition must be satisfied: 

^C.e.*0..„ = 1 (42) 

Then 

AEntropyBestGain _ , ..ox 

so 

ACost = AEntropyBestGain (44) 

The cost for execution of any of the two moves is the same. Equating this 
cost, it results 

AEntropycategorvi AC.. f.r-s 
^— ^ = AEntropyBestGain (45) 

It means 

_ AEntropycategorvi 



(46) 



* AEntropyBestGain 

which is a very intuitive result. 

In general, for a trajectoryi, the probability of a trajectory to be explored 
should be in this system 

p _ AEntropytrajectoryt ACostBestTrajectory 

-^trajectoryi — a jti I * a T V^' j 

lAtLntropyBestTrajectory OSttrajectoryi 
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4.4 The ERS* , the entropy reduction seeirch in computer 

chess 

Let be the probability that a move is executed and one more ply is added 
to the search. 

The size of the ply added should be function of this probabihty. It is logically 
to consider the size of the play as a quantity increasing with the probability of 
the move not being executed. The probability of the move not being executed 
is 1 — therefore assuming an abstraction, a linear relation of the form: 
size of ply = k*( probability of a move not being executed ) then the relation 
between the size of the ply and the probability of the move to be chosen would 
be for k = 1 

AEntropyBestGatn 

This may be considered even a theorem describing the size of the fraction 
ply in computer chess and even for other EXPTIME problems under the above 
assumptions and resulting from the above calculations. 

Starting from the previous equation, it is possible to use the relative entropies 
of pieces and positional patterns to implement the previous formula. 

Consider the check as the move with the ultimate decrease in entropy because 
its forceful nature and because it has a higher frequency in the vicinity of the 
objective, the mate than any other move. Then all the other moves may be 
rated as function of the check move. Let such value be log 30 . Here can be used 
a constant reflecting the above mentioned properties of such move. It must be 
noted that not all checks are equally significant. Several categories of checks can 
be introduced instead of a single check category. Also in the application, not 
all checks are equally important, check and capture for example gains a better 
priority but in this example does not have a smaller depth. 

As a consequence, if the normal increase in search depth is counted as 1 for 
moves without significance the fractional ply for a check is: 

D = l-p^ (49) 
log 30 ^ ' 

then D = in this system because the best move should be always executed 
and then the depth added should be 0. 

For a capture of queen the entropy rate of the system decreases with log 28. 
Then the fractional ply for a queen capture is 

^ = 1-!°^ (50) 
log 30 ^ ^ 

after calculations, D = 0.02 

For a capture of rock the entropy rate of the system decreases with log 14. 
Then the fractional ply for a rock capture is 
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loE 14 

log 30 ^ ' 

after calculations, D = 1 - 0.776 = 0.223 

Instead of using the entropy rates for calculating the size of the fractional 
depth it is possible to use the value of pieces which is strongly correlated for 
most of the systems with the entropy rate of the pieces. As it can be seen from 
the calculation above, the higher the differences in entropy between consecutive 
positions in a variation, the higher the information gain. This can be understood 
as a divergence between distributions of consecutive moves. The more they 
diverge the higher the information gain after a move. 



4.5 Experimental results 

As a test case it is used a combination which gives us the possibility to define 
the quality of the response to a position in a precise way. The meaning of the 
columns is the following: 

column 1:EXPERIMENT NUMBER - represents the number of the search ex- 
periment 

column 2:N0DES SEARCHED - represents the number of nodes searched in 
the experiment 

column 3:TERM DIVIDING THE REDUCTION IN PLY - represents the num- 
ber dividing the term decreasing the size of the normal ply added to the current 
depth 

column 4: MAX DEPTH ATTAINED - the max;imum depth in standard plies 
attained , here it is added 1 for each ply 

column 5: MAX UNIFORM DEPTH - the maximum allowed depth in the par- 
tial depth scheme considering a step of 6 decreased with a value depending to 
the quality of the move 

column 6: SOLVED OR NOT - 1 if the case has been solved with the parame- 
ters from the other columns 

column 7: STEP SIZE - the number added to the partial depth for each new 

level of search in case of moves without importance 

The following is the table with the results of the search experiments: 



column 1 


column 2 


column 3 


column 4 


column 5 


column 6 


column 7 


1 


20827 


1 


17 


16 


1 


6 


2 


1080 


1.25 


8 


16 





6 


3 


88532 


1.25 


12 


22 





6 


4 


139545 


1.25 


12 


24 





6 


5 


155130 


1.25 


14 


26 





6 


6 


291714 


1.25 


14 


28 





6 


7 


82208 


1.25 


16 


30 


1 


6 


8 


311166 


1.5 


12 


32 





6 


9 


494560 


1.5 


13 


34 





6 



15 



column 1 


column 2 


column 3 


column 4 


column 5 


column 6 


column 7 


10 


1009407 


1.5 


14 


36 





6 


11 


208423 


1.5 


15 


38 


1 


6 


12 


1821489 


1.75 


13 


40 





6 


i;] 


2337740 


l.T-l 


14 


42 





(j 


14 


381146 


1.75 
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4.6 Interpretation of the experimental results 

4.6.1 (i) The step of the search 

At first a step representing a fraction of 1 has been used. However, better results 
have been obtained by rising a step bigger than 1 for not so interesting moves. 
The cause is the decrease in the sensitivity of the output and of other search 
dependent parameters in regard to the variations of other parameters and of 
the positional configurations. 

4.6.2 (ii) The importance of detecting decisive moves early 

The detection of the variation leading to the objective early decreases the num- 
ber of nodes searched very much. The fact that the mate has been found at 
13 plies depth after only 20000 nodes searched shows the line to mate has been 
one of the first lines tried at each level, even without using knowledge. As it 
can be seen from the table if the mate is detected relatively fast the number 
of nodes searched is more than 10 times smaller. The next plot shows this. 
The maximums in the number of nodes represents the configurations ( a set of 
parameters ) for which the mate has not been fast detected. The OX represents 
the number dividing the factor giving importance to some significant moves and 



16 



on OY it is represented the number of nodes searched. 

Plot of the increase in number of nodes when the importance given to moves 
with high information gain is decreased 



On OX it is represented the virtual depth. On OY it is represented the 
number of nodes. 

As it can be seen, even a deeper search that detects the decisive line will 
explore less nodes than a shallower search that does not find the decisive line. 
For this heuristic and for most of the combinations, when the mate or a strongly 
dominant line is found fast, the drop in the number of nodes searched is as high 
as 10 times, even if the uniform search is parametrized for a higher depth. 

4.6.3 (iii) The effect of the importance given to high information 
lines 

The number of nodes to be searched increases very much with the decrease of 
importance given to important moves and to lines of high informational value. 

The following plot, based on data from the previous table shows the increase 
in the number of nodes explored with the decrease in the importance given 
to information gain when the solution is found. The less importance to the 
information gaining moves and lines is given, the greater the need for a higher 
amount of nodes to be searched in order to find the solution. 
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The ef^fcts of decrease in the importance given to high information lines 



/ 



I 



1,5 2 2,5 3 3,5 4 4,5 5 

decrease factor of the reduction in ply size 



On OX it is represented the TERM DIVIDING THE REDUCTION IN PLY 
which represents the number dividing the term decreasing the size of the normal 
ply added to the current depth. On OY it is represented the number of nodes. 
The plot shows the explosion of nodes required to find a solution when the 
importance given to high information lines is decreased. As the importance 
given to high information lines is decreased the number of nodes searched has 
to be increased. The importance given to information is decreased so the depth 
of search must be increased to find the solution. 

The following plot has the same significance but for the case when the solu- 
tion is not found. 

deijgase of importance of information gain moves vs nodes increase 



■D 5- 


c , 
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reduction of importance to information gaining moves 



The plot of nodes searched vs depth when the solution is detected fast shows 
a far less pronounced combinatorial explosion then when the solution is not 
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found. The plot shows the explosion of nodes required to find a solution when 
the importance given to high information lines is decreased. As the importance 
given to high information lines is decreased the number of nodes searched has to 
be increased. It increases even faster when the decisive line is not detected. For 
a high depth of search, the search cost registers an explosion when no decisive 
move is found reasonably fast. 

When less importance is given to high information gain moves the number of 
plies has to be increased to compensate this and the number of nodes explodes 
with the number of plies. The plot shows the necessary increase of depth when 
the importance of high information gain moves is decreased. 

For the case when the problem is solved the plot is: 

decrease of importance of information gain moves vs plies 
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reduction of importance to iofomiation gaining moves 

Now we can analyze the data for the cases when the solution is not achieved. 
For the case when the position is not solved is a similar plot but the search at 
the respective depth has been realized at a far greater cost than when the solu- 
tion has been found fast: 
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decrease of importance of information gain moves vs plies 
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4.6.4 (iv) The maximum depth and the importance given to high 
information lines 

The maximum depth achieved decreases with a decrease in the importance given 
to areas of the tree with high information. Maximum depth vs importance given 
to information gain. If less importance is given to moves with high information 
gain more resources are needed for attaining a maximum given depth. This is 
the case for solving some combinations. 

decrease of importance of information gain moves vs maximurri deptti 
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reduction of importance to 



moves 



As it can be seen from the previous plot, the maximal depth has been 
achieved also when the solution has not been found but as it can be observed 
from the above table and plots, at an ever increased cost. 

For the search experiments when the solution has not been found the highest 
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depth remains the same but this time the cost of resources needed to sustain 

that depth increased very fast, faster than in the previous plot when the solution 

has been found. 

decrease of importance of information gain moves vs nnaxinnum deptti 
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reduction of importance to 



moves 



4.6.5 (v) The depth of search and the objective 

The search detects the mate even if the maximum length is just one ply deeper 
than the length of the combination. Even if we keep the maximum depth con- 
stant at far greater cost the searches are less likely to find the decisive lines as 
it can be seen from plots. 

4.6.6 (vi) The effect of partial depth on the distribution of depths 

As the search has been changed and less importance has been given to interesting 
moves, the range in the length of the variations became smaller as less energy has 
been allocated for the most informative search lines than previously and more 
energy to the less informative lines. After shifting ever more resources from the 
informative line to other lines, the objective, the solution of the combination, has 
not been attained any more by the best lines who did not have the energy this 
time to penetrate deep enough. The best variations did not have any more the 
critical energy to penetrate the depth of the state space and solve the problem. 
The weaker lines were not feasible as a path for finding any acceptable solution. 
From this we can understand the fundamental effect of resource allocation. And 
how marginal shifts in resources can lead in this context to completely different 
result. If somebody used the same depth increase for each move, therefore 
allocating the resources uniformly to the variations only a supercomputer can 
go as deep as it is needed for finding the solution to this combination which 
is not among the deepest. With the introduction of knowledge and heuristics 
much greater performance would be possible. The experiment concentrates on 
one heuristic and its effect on the search is highly significant. 
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4.6.7 (vii) On deep combinations 

In order to solve deep combinations where some responses are not forced a 
program must have chess speciahzed knowledge ( or an extension of the infor- 
mation theoretical model of computer chess to all chess theory ) in order to 
give importance to variations without active moves but with significant tactical 
maneuvering between forceful moves such as checks and captures. 

5 Discussion 

5.1 General considerations 

Stochastic modeling in computer chess In the context of game theory, 
chess is a deterministic game. The practical side of decision in chess and com- 
puter chess has many probabilistic elements. The decision is deterministic, but 
the system that takes the decision is not deterministic, it is a stochastic sys- 
tem. The human decision-making system and its features such as perception 
and brain processes are known to be stochastic systems. In the case of computer 
chess many of the search processes are also stochastic, as it has been seen from 
the previous examples. 

5.2 The scope of the results 

The implications of the information theoretic model in terms of heuristic de- 
velopment are discussed in this paper. The extension of the model for more 
elements of computer chess are left for a different research. 

5.3 The limitations of the model 

The limitations of the model are given by the ability to detect the information 
gain resulted from different moves and to quantify the information gain resulted 
from these moves. 

5.4 outlook 

The objective for futiirc research is to explain also other methods from computer 
chess using the information theoretic model. Applications also in the case of 
other strategy games are also a future objective. 

5.5 Conclusion 

The model starts from the axiomatic framework of information theory and de- 
scribes in a formal way the role of information in the efficiency and effective- 
ness of the heuristics used in computer chess and other strategy board games. 
The model proposed considers information in its formal information theoretical 
meaning as the objective of exploration and the essential factor in the quality 
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of decision in chess and computer chess as well as in other similar games. The 
method of partial depths scheme, well known in practice has been described 
mathematically by observing the fundamental fact that information gain is the 
criteria that determines the decrease in the uncertainty of the position. The 
uncertainty of the position is described in a mathematical way through the 
concept of entropy. The information gain describes in a information theoretic 
way the decrease in uncertainty resulted from making a move. In this way, a 
quantification of search information is realized. This refers to entropy as it is 
understood in information theory but it is possible to build parallels also with 
thermodynamics. Previous approaches relied on intuitive formulas and descrip- 
tions of the best moves in terms of " interestingness" or in terms of chess theory 
or using knowledge extracted from the games of strong players. The approach of 
the method proposed here is different in that it explains an important method 
such as the fraction ply method using mathematical methods and formulas that 
can be derived from the axioms of information theory and determines important 
coefficients such as the fraction ply associated with moves. The problem of NxN 
chess is a generalization of the 8x8 chess. It can be expected that the general 
approach proposed would give a general method for the NxN problem where 
specialized knowledge is not known and would also provide a method to analyze 
other EXPTIME-COMPLETE problems which can be transformed in the NxN 
chess. The method provides a new understanding of chess, a game analyzed sci- 
entifically before by scientists such as Norbcrt Wiener, John Von Ncwumann, 
Allan Turing, Claude Shannon, Richard Bellman and other famous scientists. 
The method proposed generalizes previous approaches and grounds them on 
information theory a field with a strong theoretical axiomatic system. It can be 
expected that the method can provide an example on how to quantify search 
for difficult problems from classes with high complexity and connect search in 
computer science also to physics through the common concept of entropy. 

Acknowledgment : The author acknowledges with thanks his discus- 
sions with Alberto Giovanni Busetto and Prof. J. Buhmann 
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6 APPENDIX 

A Code 

double minimax (double alfa, double beta.int depth, int k,int type, move mv, 
double previousval, double virtualdepth) { 
move* listNewMoves = (move*) new move [100] ; 

move mr; double value = , temp = , ev = ; int c, number; 

if ( (virtualdepth >= maxDepth | | depth >= maxExtension ) ){ 

return evaluation(type ,mv) ; 
}else-C 

if ( tip == 1 ){ value = -10000; } 
else{ 

value = 10000; 
} 

generator (mv , listNewMoves .number) ; 
for (int i=l; i <= number ;i++){ 

listNewMoves [i] . eval = fabs( evaluation(tip, listNewMoves [i] ) - previousval ) ; 
double b = -1; 

if( isCheck( listNewMoves [i] ) ) 
listNewMoves [i] .eval += 10000; 
} 

if( number == ){ 
if ( tip == 1 ) 

if( ! is_legal_w(mv) ) return inf_plus; 
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else return 0; 
}else-C 

if( ! is_legal_n(mv) ) return inf_neg; 

else return 0; 

} 

}else 

forCint kl=l;kl <= number;kl++){ 
double max = -1; 
int ic = -1; 

forCint c = 1 ; c <= number ; C++ ){ 

double comp = listNewMoves [c] .eval; 

if ( comp > max ){ 

max = listNewMoves [c] .eval; 

ic = c; 

} 

} 

mr.eval = listNewMoves [ic] .eval; 

double evalPosition = listNewMoves [ic] .eval; 

lista_pozitii_urm[ic] .eval = -2; 

copy (mr .move, listNewMoves .move) ; 

copy( mr.tabla, listNewMove [ic] .tabla ); 

mr.turn = lista_pozitii_urm[ic] .turn; 

double nextV = evaluation(tip,mr) ; 

if( evalPosition > 2000 ) 

value = - minimaxC -beta ,-alfa, depth + 1 , ic ,-tip,mr ,nextV,virtualdepth ); 

else { 

double add = log(fabs(0.1 + (evalPosition/100)))/(log(10.0)) + 5 . 0/log (number + 2 ); 
value = - minimaxC -beta ,-alfa, depth + 1 , ic ,-tip,mr ,nextV,virtualdepth + 6 - add); 
} 

if ( value >= alfa ) alfa = value; 

if ( alfa >= beta ){ 

cutof f ++; 

break; 

} 

} 

return alfa; 

} 

} 
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B Brief description of some chess concepts 



The reason for presenting some concepts of chess theory. Some of 
the concepts of chess are useful in understanding the ideas of the paper. Re- 
gardless of the level of knowledge and skill in mathematics without a minimal 
understanding of important concepts in chess it may be difficult to follow the 
arguments. It is not essential in what follows vast knowledge of chess or a 
very high level of chess calculation skills. However, some understanding of the 
decision process in human chess, how masters decide for a move is important 
for understanding the theory of chess and computer chess presented here. The 
theory presented here describes also the chess knowledge in a new perspective 
assuming that decision in human chess is also based on information gained dur- 
ing positional analysis. An account of the method used by chess grandmasters 
when deciding for a move is given in a very well regarded chess book. j7j . 

Combination A combination is in chess a tree of variations, containing only 
or mostly tactical and forceful moves, at least a sacrifice and resulting in a 
material or positional advantage or even in check mate and the adversary cannot 
prevent its outcome. The following is the starting position of a combination. 




The problem is to find the solution, the moves leading to the objective of 
the game, the mate. 

The objective of the game. The objective of the game is to achieve a 
position where the adversary does not have any legal move and his king is under 
attack. For example a mate position resulting from the previous positions is: 
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The concept of variation A variation in chess is a string of consecutive 
moves from the current position. The problem is to find the variation from the 
start position to mate. 

In order to make impossible for the adversary to escape the fate, the mate, 
it is desirable to find a variation that prevents him from doing so, restricting as 
much as possible his range of options with the threat of decisive moves. 

Forceful variation A forceful variation is a variation where each move of one 
player gives a limited number of legal option or feasible options to the adversary, 
forcing the adversary to react to an immediate threat. 

The solution to the problem, which represents also one of the test cases is 
the following: 

1. Q - N6 ch! ; PxQ 2. BxQNPch ; K - Bl 3. R - QB7ch ; K - Ql 4. R - B7 
ch ; k - Bl 5. RxRch ; Q - Kl 6. RxQch ; K-Q2 7. R-Q8 mate 

Attack on a piece In chess, an attack on a piece is a move that threatens 
to capture the attacked piece at the very next move. For example after the first 
move, a surprising move the most valuable piece of white is under attack by the 
blacks pawn. 

The concept of sacrifice in chess A sacrifice in chess represents a capture 
or a move with a piece, considering that the player who performs the chess 
sacrifice knows that the piece could be captured at the next turn. If the player 
loses a piece without realizing the piece could be lost then it is a blunder, not 
a sacrifice. The sacrifice of a piece in chess considers the player is aware the 
piece may be captured but has a plan that assumes after its realization it would 
place the initiator in advantage or may even win the game. For example the 
reply of the black in the forceful variation shown is to capture the queen. While 
this is not the only option possible, all other options lead to defeat faster for 
the defending side. The solution requires 7 double moves or 13 plies of search 
in depth. 
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C The axiomatic model of information theory 



C.l Axioms of information theory 

The entropy as an information theoretic concept may be defined in a precise 
axiomatic way. |33 

Let a seque 
fohowing prope 
(i) Normahzation: 



Let a sequence of symmetric functions -ff,n(PiiP2:P3: ■ ■ ■ iPm) satisfying the 
following properties: 



J?2(^,^) = l (52) 



H2{p,l~p) (53) 



(ii) Continuity: 

is a continuous function of p 
(iii) 

Hm{pi,P2, ■■■,P,n) = H,n^i{pi +p2,P3,---,Pm) = {Pl + P2)H2{ 



Pi + P2 Pl+ P2 

(54) 

It results Hm must be of the form 

Hm = -^p{x) * log p{x) (55) 
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